Automatically-extracted Thesauri for Cross-language Ir: When Better Is Worse

نویسنده

  • Ralf D. Brown
چکیده

A statistical algorithm for extracting bilingual term dictionaries (thesauri) from parallel text is presented, along with reenements for improving their size and accuracy. Somewhat paradoxically , increasing the accuracy of the extracted thesaurus can in fact reduce the performance of an IR system using it to perform query translation for cross-language information retrieval.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic processing of multilingual medical terminology: applications to thesaurus enrichment and cross-language information retrieval

OBJECTIVES We present in this article experiments on multi-language information extraction and access in the medical domain. For such applications, multilingual terminology plays a crucial role when working on specialized languages and specific domains. MATERIAL AND METHODS We propose firstly a method for enriching multilingual thesauri which extracts new terms from parallel corpora, and seco...

متن کامل

Similarity Thesauri and Cross-Language Retrieval

This paper describes a method for constructing a thesaurus automatically from a corpus of suitable documents, using standard information retrieval methods. The resulting thesauri can be used for user-initiated query expansion, automatic query expansion, as well as cross-language retrieval. Researchers at the Swiss Federal Institute of Technology in Zürich developed and evaluated this method in ...

متن کامل

Machine Generation of Thesauri: Adapting to Evolving Vocabularies in Design Documentation

A new breed of engineering design tools are electronic design notebooks, which are electronic versions of the traditional engineer’s logbook. They capture design information as it is generated, providing a rich, unfiltered history of a design project. This presents great potential for accessing past design decisions and rationale. This paper examines ways of searching for design information by ...

متن کامل

An Association Thesaurus for Information Retrieval

Although commonly used in both commercial and experimental information retrieval systems, thesauri have not demonstrated consistent beneets for retrieval performance, and it is diicult to construct a thesaurus automatically for large text databases. In this paper, an approach, called PhraseFinder, is proposed to construct collection-dependent association thesauri automatically using large full-...

متن کامل

International Conference on Engineering Design

A new breed of engineering design tools are electronic design notebooks, which are electronic versions of the traditional engineer’s logbook. They capture design information as it is generated, providing a rich, unfiltered history of a design project. This presents great potential for accessing past design decisions and rationale. This paper examines ways of searching for design information by ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998